Practical Algorithms for Best-K Identification in Multi-Armed Bandits
نویسندگان
چکیده
In the Best-K identification problem (Best-K-Arm), we are given N stochastic bandit arms with unknown reward distributions. Our goal is to identify the K arms with the largest means with high confidence, by drawing samples from the arms adaptively. This problem is motivated by various practical applications and has attracted considerable attention in the past decade. In this paper, we propose new practical algorithms for the Best-K-Arm problem, which have nearly optimal sample complexity bounds (matching the lower bound up to logarithmic factors) and outperform the state-of-the-art algorithms for the Best-K-Arm problem (even for K = 1) in practice.
منابع مشابه
Risk-Aversion in Multi-armed Bandits
Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off...
متن کاملBest Arm Identification for Contaminated Bandits
This paper studies active learning in the context of robust statistics. Specifically, we propose the Contaminated Best Arm Identification variant of the multi-armed bandit problem, in which every arm pull has probability ε of generating a sample from an arbitrary contamination distribution instead of the true underlying distribution. The goal is to identify the best (or approximately best) true...
متن کاملDynamic Ad Allocation: Bandits with Budgets
We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities). We focus on an important practical issue that advertisers are constrained in how much money they can spend on their ad campaigns. This issue has not been considered in the prior work on bandit-based approaches...
متن کاملBest arm identification in multi-armed bandits with delayed feedback
We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...
متن کاملAlmost Optimal Exploration in Multi-Armed Bandits
We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameterfree algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.06894 شماره
صفحات -
تاریخ انتشار 2017